A Quantitative Study of Video Duplicate Levels in YouTube
نویسندگان
چکیده
The popularity of video sharing services has increased exponentially in recent years, but this popularity is accompanied by challenges associated with the tremendous scale of user bases and massive amounts of video data. A known inefficiency of video sharing services with user-uploaded content is widespread video duplication. These duplicate videos are often of different aspect ratios, can contain overlays or additional borders, or can be excerpted from a longer, original video, and thus can be difficult to detect. The proliferation of duplicate videos can have an impact at many levels, and accurate assessment of duplicate levels is a critical step toward mitigating their effects on both video sharing services and network infrastructure. In this work, we combine video sampling methods, automated video comparison techniques, and manual validation to estimate duplicate levels within large collections of videos. The combined strategies yield a 31.7% estimated video duplicate ratio across all YouTube videos, with 24.0% storage occupied by duplicates. These high duplicate ratios motivate the need for further examination of the systems-level tradeoffs associated with video deduplication versus storing large number of duplicates.
منابع مشابه
The 'WeTube' in YouTube - creating an online community through video sharing
Video sharing has become a growing social practice, with YouTube being the predominant online video sharing site. Most of the research concerning YouTube’s social impact has been focused on quantitative evaluation of the social interaction facilitated by the tools embedded in the site. This study aims to explore the growth of the YouTube online community through the eyes of YouTube users who au...
متن کاملSMART Overlay: Duplicate Traffic Elimination
Internet consumes 5% of world-wide energy. The fact is that 90% of Internet traffic is video and mostly “redundant.” As an example, 10% of the most popular videos account for 90% of total views at YouTube. As a result, redundant data are repeatedly transmitted over the Internet. Our challenge is to design the first traffic deduplication technique for more efficient network communications betwee...
متن کاملHuman Perception of Near-Duplicate Videos
Popular content in video sharing websites (e.g., YouTube) contains many duplicates. Most scholars define near-duplicate video clips (NDVC) as identical videos with variations on non-semantic features (e.g., image/audio quality), while a few others also include semantic features (different videos of similar content). However, it is unclear what exact features contribute to human perception of si...
متن کاملYouTube Politics: YouChoose and Leadership Rhetoric during the 2008 Election
The present study employs both qualitative and quantitative research methods to examine the discourse of leadership in the YouTube video clips of 16 candidates who competed in the 2008 U.S. presidential race. The introduction and farewell videos of the candidates included on the YouChoose portion of YouTube are inductively analyzed for leadership utterances. Common categories are constructed th...
متن کاملUnderstanding the Characteristics of Category-Specific YouTube Videos
As the world largest video content sharing website, YouTube constantly attracts more and more attention from networking research community. Different aspects of YouTube, such as video characteristics, user behaviors, and its back-end infrastructure have been well studied. Most of these studies consider YouTube as a whole without taking the YouTube category into the consideration. However, it is...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015